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The regression analysis is a common tool in data analysis, while fuzzy 
regression can be used to analyze uncertain or imprecise data. 
Manufacturing companies often having difficulty predicting their future 
income. Thus, a new approach is required for the prediction of future 
company income. This article analyzed the manufacturing income by using 
the multiple linear regression (MLR) model and two fuzzy linear regression 
(FLR) model proposed by Tanaka and Zolfaghari, respectively. In order to 
find the optimum of the FLR model, the degree of fitting (H) was adjusted in 
between 0 to 1. The performance of three models has been measured by 
using mean square error (MSE), mean absolute error (MAE) and mean 
absolute percentage error (MAPE). Detailed analysis proved that 
Zolfaghari’s FLR model with the degree of fitting of 0.025 outperformed the 
MLR and FLR with Tanaka’s model with the smallest error value. In 
conclusion, the manufacturing income is directly correlated with six 
independent variables. Furthermore, three independent variables are 
inversely related to manufacturing income. Based on the results of this 
model, it appears to be suitable for predicting future manufacturing income. 
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1. INTRODUCTION 


A regression model is one of the most popular statistical models used to determine association 
between multivariate data [1]. This model is commonly used in the fields of applied sciences, computer 
science, social sciences, engineering, and economics [2]. Regression analysis was used when an explanatory 
variable is dependent on a response variable. It shows the value of the response variable changing when one 
of the explanatory variables varies while the rest remain unchanged [3]. 

However, statistical modelling cannot be used on all data. The traditional method would not be able to 
accurately determine a result when vague data exist. Thus, for studies related to the association between dependent 
and independent variables, the regression method unable to predict precisely because of an unpredicted event. 
Some data may not necessarily be normal in some situations, particularly when predicting income. In some cases, 
data are not normal due to outliers or missing values. Due to this, the existing method was unable to estimate and 
find out the results accurately. To deal with that situation, alternative approaches are necessary. 
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Many fields, especially engineering and technology, use the fuzzy method in the analysis of 
uncertain data [4]. In complex systems involving human estimation, fuzzy methods can be applied to analyze 
uncertain or imprecise data between response and explanatory variables [5]. First introduced by Tanaka, 
fuzzy linear regression (FLR) has demonstrated its usefulness in solving complex problems where many 
cases are difficult to quantify [6]. FLR can provide an approximation between variables with insufficient 
uncertainty information [7]. 

Zolfaghari on the other hand has introduced an extension model that involved triangular fuzzy 
numbers (TFNs). This model considers either symmetrical or asymmetrical with its membership function 
(MF). The model also considered two parameter estimation factors, which is the degree of fitting [8]. The 
factors of parameter estimation can be transformed into two ways which are linear programming and fuzzy 
least squares method. Previous studies have proved the multiple linear regression method with the fuzzy 
regression technique according to various fields of study [9]-[13]. 

Previous researcher proposed the least squares method as a common FLR model [14]. However, the 
model showed an influenced towards the outliers, which led to inaccuracies. Another fuzzy model, based on 
least absolute deviation was introduced as an alternative to cater an outlier issue [15]. In addition, it works 
well on both symmetrical and non-symmetrical data. By applying least absolute deviation approach, a model 
from fuzzy numbers of matrices is created accordingly. A study by [16] has proven that fuzzy model based 
on least absolute deviation performed better and more structured compared to least squares method. 

The aim of this study is to propose a Zolfaghari’s FLR model with adjusting the degree of fitting () 
for estimating future manufacturing income. It is expected that the proposed model will prove to be the most 
optimal model that can be applied to the industries sector. Moreover, there are no assumptions to be 
considered before the model can be analyzed. The optimal model can be obtained by adjusting the degree of 
fitting (H) in order to find out the smallest error value. 


2. PROPOSED METHOD 

The research framework of this study is shown in Figure 1. The data were obtained from the 
Department of Statistics Malaysia (DoSM) from various industry sectors which include farming, fishing, 
mining, quarrying, manufacturing, construction, transport, and others [17]. Data filtering were performed 
accordingly and only manufacturing sector has been chosen for detailed analysis. The dataset has a total of 
nine explanatory variables including legal status (individual proprietorship, partnership, private, public, co- 
operative, others), ownership (Malaysian residents, non-Malaysian residents, joints), value of assets (total net 
book value), total employment, total salaries and wages paid, number of degree and above holder, number of 
diploma holder, number of Malaysian Certificate of Education (MCE) and below holder and total 
expenditure, while income is the dependent variable [17]. 


Data Collection 


Fuzzy Linear 
Regression 


Tanaka 
Model 


Zolfaghari 
Model 


Multicollinearity 
Test 


Adjustment 


Degree of - - 
Fitting Multiple Linear 
Regression 


MSE, MAE & MAPE 


The Best Model 


Figure 1. Research framework of proposed model 
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Once data filtering was performed, data pre-processing such as outlier treatment, normality and 
multicollinearity test were done accordingly. The tests are compulsory as it needed to fulfill the first 
assumption of the model. Next, the dataset was then fed into various algorithm such as multi linear regression 
(MLR), Tanaka’s FLR and Zolfaghari’s FLR algorithm with adjusted degree of fitting (H). Then, the obtained 
error values were calculated using several performance indicators such as mean square error (MSE), mean 
absolute error (MAE) and mean absolute percentage error (MAPE) [18]. The resulting MSE, MAE and MAPE 
were then compared in order to find out the best predicting model for the industries sector. 


3. RESEARCH METHOD 
3.1. Multiple linear regression (MLR) 

MLR is an extension of simple linear regression [19] and is usually applied for many statistical 
analyses and is also known as a method that can evaluate the association among the dependent and 
independent variables [20]. MLR model has two key assumptions which are normality distribution and 
multicollinearity among explanatory variables. The Q-Q plot is used to identify normality distribution among 
the response variable and explanatory variables [21]. The next test is multicollinearity checking. It should be 
tested among explanatory variables by using variance inflation factor (VIF) to avoid any dependency 
between variables. The MLR model can be detailed as in (1). 


‘A = Bo 36 BX a B2Xp2+- se +£yXrp ae E(B) (1) 


Where r = 1, 2, N, the response variable is Y, the explanatory variable is X, to X,, regression coefficient is B, 
to Bp- The least square method (LSM) is shown in (2). 


S (Bo, Ba Bay +++» Bp) = S(B) = Nhat &f (2) 
From (1), (8) = Y — XB, Then, S(f) is shown in (3). 

S(B) = (Y — XB)" (Y — XB) 

= YTY —2B7XTY + BTXTXB (3) 


In LSM, the best fitting data is computed by minimizing S(f). Then, differentiate S(6) with respect 


to 8 where al, is equal to zero as in (4), 
B 


S| _ _oxyTy 4 2x7 xp =0 (4) 
5B B 


then, LS estimator is shown in (5), 
BSOCU MUAY (5) 


the value of dependent of Bo + 6, X;1 + B2X;2+...+fpXrp is represented by Y, and the residual ¢, = Y, — Y,. 
Few previous researchers could explain the detailed of least square estimator method [19], [22]. 


3.2. Fuzzy linear regression (Tanaka) 

In 1982, FLR was first proposed by Tanaka. FLR analysis aims to explore the potential models that 
fit observed fuzzy data. The difference of model is usually based on the fitting’s formula. The final model of 
FLR is shown in (6). In order to obtain the fuzzy model, the estimation of the fuzzy parameters is done by 
solving a linear programming problem as in (7) [6], [23]. 

Y¥ = Ap (Go, Co) + Ax (G1, C1 )X1 + Az (G2, Cz )X2 +... FAK (Aes Ch) Xk (6) 


Linear programming problem, 


min = cy+...+Cy 
a,c 
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Nat CL Ht) op Ge |xqr| 2Ygt+U-H)é, 

—aS%q + (1-H) Dy Cr |Xqr| = —yg + (1 - Weg (7) 


where H is a degree of fitting, @ is fuzzy center, c is fuzzy width and e, is the error. If the linear 
programming problem in (7) could be solved, it is considered that the fitted model is satisfied. 


3.3. Fuzzy linear regression (Zolfaghari) 

In 2014, Zolfaghari proposed a new extension of FLR toward Tanaka’s model. There are two 
parameters that need to be considered in this model, either symmetric or asymmetric parameters. In addition, 
to determine the fuzzy parameters, the objectives of regression were done by following the linear 
programming method. This study will focus on the symmetric parameter. Under symmetric parameters, fuzzy 
coefficients are assumed as a triangular fuzzy number. The final model of FLR is also shown in (6). The 
linear programming problem is shown as in (8) [8]. 


Linear programming problem, 
min =2msp + 2 _4[Sq Dea |Xerl] 
subject to c =0 and, 
(1 =A )sp + (1 =) CAL ep) — ag — pa Ce eae 2 Yr 
(1-H )s) + (1-H) Saale) + Ay — pT Ce 2 Vy (8) 


where H is a degree of fitting, s is the spread and a is the center from triangular fuzzy numbers in symmetric 
parameter. 


3.4. Statistical performance measurements 

A statistical formula is used to evaluate the results from analysis to generalize the ability in 
prediction models and prevent overfitting. The main use of the statistical performance measurements is to 
compute how precisely a projecting model will occur in real life [20]. There are various types of performance 
measurements that could be used in statistical analysis. In this study, three methods are shown. 


Mean square error (MSE) is represented as in (9), 

MSE = Bie ir Ii)? (9) 
mean absolute error (MAE) is represented as in (10), 

MAE = List Ue Ii (10) 
mean absolute percentage error (MAPE) is represented as in (11), 


xy, PLE x100 
i 


MAPE = (11) 


N 
where y; is the real data 
¥; is the predicted data 

N is observations number 


4. RESULTS AND DISCUSSION 
Based on this research, the proposed and other existing methods have been modeled by including all 
the significant variables. Besides, the error value of MSE, MAE and MAPE for each model was calculated by 
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adjusting the degree of fitting (H) from 0 to 1. Then H value was selected by obtaining the smallest error 
value. Furthermore, the details discussion of the analysis will be elaborated by the following sections. 


4.1. Multiple linear regression 

MLR model was analyzed toward nine explanatory variables that contribute in predicting 
manufacturing income. Based on two early assumption tests, the first normality result is shown in Figure 2. 
The Q-Q plot in Figure 2 showed the data were normally distributed since the linear line is nearly straight. 
Next, the variance inflation factor (VIF) values were shown in Table 1 as a result for multicollinearity test. 
The result of the analysis indicates that all VIFF value of the explanatory variable is less than 10, which 
specifies that dependencies are not severe enough for multicollinearity situation [24], [25]. 

Futhermore, an analysis of MLR indicated that only six explanatory variables are significant toward 
manufacturing income as shown in Table |. The significant of independent variables is determined if the p- 
value < 0.05. Meanwhile, the correlation coefficient (r) is 0.993 and the determination coefficient (r7) is 0.987, 
which indicates a strong positive linear correlation between X’s and Y variables. The error values are shown in 
Table 2 where MSE = 6620192158000, MAE = 191340.3873 and MAPE = 290.0394 accordingly. 

The potential MLR model chosen is indicated as in (12), 


Y= 1689.652 + 166.250x, - 1046.874x, + 0.056x3 — 196.453x, — 23.868xg + 1.055x, (12) 


Scatterplot 
Dependent Variable: Y 


Regression Standardized Predicted Value 


8? 
| 


T T T T T 
0 5000000 10000000 15000000 20000000 25000000 


Y 


Figure 2. Normal Q-Q plot of MLR model 


Table 1. Coefficient (8) and variance inflation factor (VIF) values 
Variables Coefficient (6) VIF 


(Constant) 1689.652 - 
xy 166.250 1.018 
Xg -1046.874 1.051 
x3 0.056 1.226 
X7 -196.453 3.055 
Xg -23.868 3.237 
Xo 1.055 1.936 


Table 2. Result of errors for MLR model 
MSE MAE MAPE 
6620192158000  191340.3873 290.0394 


4.2. Fuzzy linear regression (Tanaka) 

In Tanaka’s model, the degree of fitting in the FLR model has been adjusted between 0 and 1 to 
obtain the least error value as shown in Table 3. The best of MSE, MAE and MAPE values are 
829657000000, 185519.8663 and 126.5645 respectively. The best model is shown as in (13) with H = 0.95 
involving all explanatory variables. Table 4 shows the @, as fuzzy centre of a parameter and cg as the 
fuzziness of its parameter. 
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Y = (4505, 0)+(553, 0) x1 +(-5702, 0) x2+(-0.5711, 0) x3 +(-37137, 0) x4+(1.7581, 0) xs 


+(37309, 0) x6+(37036,0) x7 +(37134, 0) xg+(2.5560, 43.8499) x5 (13) 


Table 3. Result of measurement error for FLR (Tanaka model) 
H MSE MAE MAPE 
0.1 1081480000000 213432.401 148.1297 
0.2 1049280000000 210523.225 146.6079 
0.3 1028260000000 207442.1272 144.5161 
0.4 987919000000 204217.4769 143.5276 
0.5 958462000000 201065.4216 138.6407 
0.6 929897000000 197840.8844 136.2766 
0.7 899143000000 193323.8679 134.4598 
0.8 872273000000 190428.6030 132.8196 
0.825 864476000000 189695.2996 131.4377 
0.85 856749000000 188407.1895 129.2915 
0.9 843124000000 186978.1420 128.2866 
0.925 837978000000 186529.5707 127.7575 
*0.95 829657000000 185519.8663 126.5645 
0.99 26747600000000 2346763.969 5091.5502 
*Bold indicating best results 


Table 4. Detailed of the antecedent parameter for FLR (Tanaka model) 
Fuzzy Parameter = Fuzzy Center, Fuzzy Width, cg 


q 


Ap 4505 0 
A, 553 0 
A, -5702 0 
Ay 0.5711 0 
A, -37137 0 
As 1.7581 0 
Ag 37309 0 
A, 37036 0 
As 37134 0 
Ag 2.5560 43.8499 


4.3. Fuzzy linear regression (Zolfaghari) 

Zolfaghari’s model is also used in predicting manufacturing income. The MSE, MAE and MAPE 
values are 575629000000, 154153.8335 and 104.6929 respectively as shown in Table 5. The best model is 
shown as in (14) where H = 0.025. Table 6 shows that the a, is fuzzy centre of a parameter and cy is the 
fuzziness of its parameter. 


Y = (2961, 0) +(378, 0) x, +(-3507, 0) x2+(-0.4862, 0) x3+(-28925, 0) x4+(1.9951, 0) xs 


+(29029, 0) x¢+(28664, 0)x7+(28918, 0) xg +(2.3061, 1.9699) x5 (14) 


Table 5. Result of measurement error for FLR (Zolfaghari model) 
H MSE MAE MAPE 
*0.025 575629000000 154153.8335 104.6929 
0.05 580762000000 154660.2431 106.7944 
0.1 592114000000 =156315.3180 107.9295 
0.15 603390000000 =.157780.7371 107.6545 
0.175 608365000000 = =158271.2320 108.8825 
0.2 614717000000 =—159256.9142 109.7747 
0.3 638015000000  162232.2573 111.6189 
0.4 660062000000 =164442.1358 ~=—111.5769 
0.5 686064000000 =168189.6212 112.8879 
0.6 715225000000 =173957.0441 =: 121.1385 
0.7 740641000000 176370.6562 121.7880 
0.8 765726000000 =179421.2831 124.6240 
0.9 792142000000 _182630.8518 127.3443 
*Bold indicating best results 
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Table 6. Detailed of the antecedent parameter for FLR (Zolfaghari) model 
Fuzzy Width, Cq 


Fuzzy Parameter Fuzzy Center, 


Ap 2962 0 
Ay 388 0 
A, -3517 0 
Ay -0.4863 0 
A, -28935 0 
As 1.9961 0 
Ag 29129 0 
A, 28665 0 
As 28928 0 
Ay 2.3161 1.9599 


4.4. Summary of results 

Table 7 summarizes the experimental results for MLR, FLR of Tanaka and FLR of Zolfaghari 
models. The performance of three models was evaluated by using MSE, MAE and MAPE values. Among 
other, FLR proposed by Zolfaghari exhibit the best results with the lowest MSE, MAE and MAPE. Figure 3 
shows the plot of real and expected data for manufacturing income. 


Table 7. Summary of error measurement for three models 


Models of Linear Regression H MSE MAE MAPE 
MLR - 6620192158000  191340.3873 290.0394 
FLR (Tanaka) 0.95 829657000000 =: 185519.8661 = 125.5645 
FLR (Zolfaghari) 0.025 575619000000 —_:154163.8335 105.6919 
50000000 
5 
40000000 
30000000 
= 
S 20000000 
¢ o aan 
10000000 Pe 
0 As 
5000000 10000000 15000000 20000000 25000000 
-1000000 
Predicted (Y) 


Figure 3. The actual and predicted values for manufacturing income 


5. CONCLUSION 

For the purpose of predicting manufacturing incomes, three models were applied: the MLR, the FLR 
proposed by Tanaka, and the FLR proposed by Zolfaghari. It appears that the FLR model introduced by 
Zolfaghari with H = 0.025 is the optimal model based on the MSE, MAE, and MAPE derived from all nine 
explanatory variables. By contrast, only six explanatory variables were significant according to the MLR. 
Since the MLR has the highest error values in comparison to the other methods, it cannot be used as a guide. 
Zolfaghari's FLR model indicates that manufacturing income is directly proportional to legal status, total 
salaries and wages paid, number of degree and diploma holders, number of SPM and below holders, and total 
expenditure. Additionally, manufacturing income is inversely proportional to ownership, asset value and 
number of employees. A manufacturer can use this output as a guide to improve their earnings. 
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